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Abstract. Viewing Dehn's algorithm as a rewriting system, we generalise 
to allow an alphabet containing letters which do not necessarily represent 
group elements. This extends the class of groups for which the algorithm 
solves the word problem to include nilpotent groups, many relatively hyper- 
bolic groups including geometrically finite groups and fundamental groups of 
certain geometrically decomposable manifolds. The class has several nice clo- 
sure properties. We also show that if a group has an infinite subgroup and 
one of exponential growth, and they commute, then it does not admit such an 
algorithm. We dub these Cannon's algorithms. 



1. Introduction 

1.1. Dehn's algorithm. Early last century Dehn [5] introduced three problems. 
We know them now as the word problem, the conjugacy problem and the isomor- 
phism problem. Given a finitely generated group G and generating set Q, we have 
solved the word problem if we can give a procedure which determines, for each 
word w (z G* whether or not w represents the identity. We have solved the conju- 
gacy problem if we can give a procedure which determines, for each pair of words 
u,v € G*, whether they represent elements which are conjugate in G. For the iso- 
morphism problem, Dehn invites us to develop procedures for determining if two 
given groups are isomorphic. 

Using hyperbolic geometry Dehn proceeded to solve the word and conjugacy 
problems for the fundamental groups of closed hyperbolic surfaces. Let us take a 
moment to describe his solution of the word problem. For specificity, let us take 
the two-holed surface group 

{xi,yi,X2,y2 I [xi,yi][x2,y2])- 

The Cayley graph of this group sits in as the 1-skeleton of the tessellation 
of by regular hyperbolic octagons, and the relator R — [xi,yi][x2,y2] labels 
the boundary of each octagon. A word w now lies along the boundaries of these 
octagons and is a closed curve if and only if it represents the identity. Dehn then 
shows that any reduced closed curve travels around the far side of some "outermost" 
octagon and in doing so contains at least 5 of its 8 edges. That is, each reduced 
word representing the identity contains more than half of a relator. (Here we are 
allowing cyclic permutations of R and R~^.) 

This solves the word problem, for we can decompose the relator as uv~^ where 
u appears in w = xuy and u is longer than v. This allows us to replace w with 
the shorter word w' — xvy. If the word w represents the identity and w' is not 
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empty, we can again shorten w' in similar manner. This process either ends with 
a non-empty word which we cannot shorten, in which case w did not represent the 
identity, or with the empty word in which case w did represent the identity. 

Accordingly, we say the the group G has a Dehn's algorithm if it has a finite 
presentation 

such that every word w d G* representing the identity contains more than half of 
some relator in 2?. Equivalently, we could write P as a finite set of relations Ui = Vi 
so that for each i, £{ui) > £{vi) and every word w Cz G* representing the identity 
contains some Ui. 

It is a theorem [17] [5] [1] that a group has such a Dehn's algorithm if and only if 
it is one of those groups which are variously called Gromov hyperbolic, hyperbolic, 
negatively curved or word hyperbolic. 

1.2. A new definition. Cannon [B] suggested we take the following viewpoint. We 

have a class of machines designed to carry oui[^ Dehn's algorithm. Such a machine 
would be equipped with a finite set of length reducing replacement rules Ui ^ Vi. It 
would have a window of finite width through which it would examine a given word. 
This window would start at the beginning of the word. As the window moved along, 
the machine would scan the word looking for occurrences of u^'s. If it fails to find 
any Ui and is not already at the end of the word, it moves forward. If it finds a Ui 
it replaces it with the corresponding Vi. (The blank spaces magically evaporate.) 
The window then moves backwards one letter less than the length of the longest Ui 
or to the beginning of the word if that is closer. It accepts a word if and only if it 
succeeds in reducing that word to the empty word. 

The key difference here is that our working alphabet is no longer restricted to the 
group generators. We shall see that there are several different classes of machines 
here with some rather divergent properties. We do not know if these competing 
definitions for the title of "Dehn machine" yield different classes of groups. Our 
most restrictive version solves the word problem in a much larger class of groups 
than the word hyperbolic groups. 

We describe these classes of machines in terms of rewritings that they carry 
out. In each of these, we are supplied with an alphabet A and a finite set of 
pairs {ui,Vi) S A* x A* where for each i, £{ui) > t(vi). We call these rewriting 
rules and write ui Vi. We call Ui and Vi the left-hand side and the right-hand 
side respectively. For technical reasons we also have to allow the machines to have 
anchored rules: these are rules which only apply when the left-hand side is an initial 
segment of the current word. We write ~u for the left-hand side of an anchored rule 
and consider u and ~u to be distinct. 

Let 5 be a finite set of rewriting rules such that each left-hand side appears at 
most once. We say that w G A* is reduced with respect to S if it contains none of 
the left-hand sides in S. The following algorithm, which we call the incremental 



Morally, Dehn's algorithm represents a linear time solution to the word problem, but this 
actually depends on the machine implementation. If it is implemented on a classical one-tape 
Turing machine, the running time is 0{n^) due to the need to exorcize (or traverse) the blanks 
left by each replacement. If it is implemented on a random access machine, it is (^(nlogn) due 
to the size of the words needed to indicate addresses. If it is implemented out on a multi-tape 
machine it is 0{n) since here blanks "evaporate" between the tapes |10| . Recently, |13l has shown 
that there is a real-time multi-tape implementation. 
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rewriting algorithm given by (A, S"), replaces any w G A* by a reduced word in 
finitely many steps. If w contains a left-hand side, find one which ends closest to 
the start of w; if several end at the same letter, choose the longest; if possible, 
choose an anchored one in preference to a non-anchored one of the same length. 
Replace it by the corresponding right-hand side. Repeat until w is reduced. 

Here is a slightly different definition: the non-incremental rewriting algorithm 
given by (A, S), replaces any w S A* by a reduced word in finitely many steps. Here 
S may also include end-anchored rules, with left-hand side u~, and rules anchored 
at both ends. If w contains a left-hand side, find one which starts closest to the start 
of w; if several start at the same letter, choose the longest; prefer anchored rules 
when there is a choice. Replace it by the corresponding right-hand side. Repeat 
until w is reduced. 

Each of these algorithms gives a reduction map R = Rg : A* ^ A* where R{w) 
is the reduced word which the algorithm produces starting with w. The incremental 
rewriting algorithm gets its name from the following property: if R is the reduction 
map of an incremental rewriting algorithm, then R(uv) = R(R(u)v). 

We may wish to apply an incremental rewriting algorithm only to words in Aq 
where Ag C A. We then refer to Aq as the input alphabet and A as the working 
alphabet. The algorithm can then be given as a triple (Ao,A, 5). We say that 
{w G Aq I R(w) is empty} is the language of this triple. The same can be done for 
non-incremental rewriting algorithmsj^ 

Clearly Dehn's Algorithm can be implemented as an incremental rewriting algo- 
rithm, with Aq = A = Q and S obtained from the u^. We generalize this as follows. 
(See Section [3] for the example which originally motivated this definition.) 

Definition 1.1. A group G, with semi-group generators Q, has a Cannon's algo- 
rithm if there exists an alphabet A D Q ^ and set of rewriting rules S over A, such 
that the incremental rewriting algorithm reduces g & Q* to the empty word, if and 
only if g represents the identity in G. 

We have chosen incremental rewriting algorithms because of their nice group 
theoretic properties. Using incremental rewriting algorithms in the previous defini- 
tion ensures that the Cannon's algorithm remembers group elements. That is, if G 
has a Cannon's algorithm with input alphabet Q and reduction map R, and there 
are x and y in Q* so that R(x) = R{y), then x and y denote the same element of 
G. This property does not hold in general if one uses non-incremental rewriting 
algorithms. 

On the other hand, non-incremental rewriting algorithms have nice language 
theoretic properties in that they support composition. In the following, we will 
conceal some technical details in the word "mimics" . One can imagine the non- 
incremental rewriting algorithm as being carried out by a machine with a finite 
number of internal states Si and a list of rewriting rules Si for each state Si . There 
is a non-incremental rewriting algorithm which mimics the action of this multi- 
state machine. Consequently, given two non-incremental rewriting algorithms over 
the same alphabet A with reduction maps Q and R, there is a non-incremental 
rewriting algorithm which mimics a non-incremental rewriting algorithm whose 

'^Since this work first appeared in preprint form, Mark Kambites and Friedrich Otto |16l have 
shown that the incremental rewriting algorithm lanaguages are contained in the set of Chrurch- 
Rosser languages and that a language is a non-incremental rewriting algorithm language if and 
only if it is a Church-Rosser language. 
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reduction map is Ro Q. Wc will refer to a Cannon's algorithm carried out using a 
non-incremental rewriting algorithm as a non-incremental Cannon's algorithm. 

1.3. Results. Before describing our results, we note that many of these were inde- 
pendently rediscovered by Mark Kambites and Friedrich Otto [15]. We show here 
that groups with Cannon's algorithms have the following closure properties: 

(1) If G has a Cannon's algorithm over one finite generating set then it has a 
Cannon's algorithm over any finite generating set. 

(2) If G has a Cannon's algorithm and G is a finite index subgroup of H then 
H has a Cannon's algorithm. 

(3) If G and H have Cannon's algorithms, then G*H has a Cannon's algorithm. 

(4) If G has a Cannon's algorithm and H is a. finitely generated subgroup of G 
then H has a Cannon's algorithm. 

This last closure property significantly increases the class of groups with Can- 
non's algorithms. Every word hyperbolic group has a Cannon's algorithm, and as 
Bridson and Miller have pointed out to us, the finitely generated subgroups of word 
hyperbolic groups include groups which are not finitely presented and groups with 
unsolvable conjugacy problem [2]. 

We also show that groups with Cannon's algorithms include 

(1) finitely generated nilpotent groups, 

(2) many relatively hyperbolic groups including geometrically finite hyperbolic 
groups, and fundamental groups of graph manifolds all of whose pieces are 
hyperbolic. 

We prove the first of these by means of expanding endomorphisms. The parade 
example of an expanding endomorphism is the endomorphism of the integers n i—s- 
lOn. The facts that this map makes everything larger and that its image is finite 
index combine to give us decimal notation. Our Cannon's algorithms for nilpotent 
groups consist of this sort of decimalization together with cancellation. We are then 
able to combine these methods with the usual word hyperbolic Cannon's algorithms 
to produce the second class of results. 

We are also able to prove that many groups do not have Cannon's algorithms. 
We have the following criterion: suppose G has two subsets, and ^2 and that 
both of these are infinite and the growth of S2 is exponential. Suppose also that 
these two sets commute. Then G does not have a Cannon's algorithm. This allows 
us to rule out many classes of groups including Baumslag-Sohtar groups, braid 
groups, Thompson's group, solvegeometry groups and the fundamental groups of 
most Seifert fibered spaces. In particular, we are able to say exactly which graph 
manifolds have fundamental groups which have Cannon's algorithms. 

We have discussed Cannon's algorithms which are carried out by incremental 
rewriting algorithms and non-incremental rewriting algorithms. They can also be 
carried out non-deterministically. Given a finite set of length reducing rewriting 
rules, these solve the word problem nondeterministically if for each word w, w 
represents the identity if and only if it can be rewritten to the empty word by some 
application of these rules. All of these competing versions are closely related to the 
family of growing context sensitive languages. A growing context-sensitive grammar 
is one in which all the productions are strictly length increasing. It is a theorem 
that a language L is a growing context-sensitive language if and only if there is a 
symbol s and a set of length reducing rewriting rules such that a word w is in L if 
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and only if it can be rewritten to s by some application of these rules. While the 
family of languages with non-deterministic Cannon's algorithms and the family of 
growing context-sensitive languages may not be exactly the same, our criterion for 
showing that a group does not have a Cannon's algorithm also seems likely to show 
that its word problem is not growing context-sensitive. Now all automatic groups 
(and their finitely generated subgroups) have context-sensitive word problems |22j . 
Thus extending this result to the non-deterministic case would show that the class 
of groups with growing context-sensitive word problem is a proper subclass of those 
with context-sensitive word problerrj^ 

1.4. Thanks. We wish to thank Gilbert Baumslag, Jason Behrstock, Brian Bowditch, 
Martin Bridson, Bill Floyd, Swarup Gadde, Bob Gilman, Susan Hermiller, Craig 
Hodgson, Chuck Miller, Walter Neumann and Kim Ruane for helpful conversations. 
We also wish to give special thanks to Jim Cannon for suggesting the key idea of 
this work to us during a conference at the ANU in 1996, and for working with us 
during the evolution of this paper. 

2. Basic Properties 
Let us start by justifying the term incremental rewriting algorithm. 

Lemma 2.1. Let i? : A* — > A* denote reduction by a fixed incremental rewriting 
algorithm. Then for all u,v Cz A*, R{uv) — R{R{u)v). 

Proof. If a substitution can be made in u, the same substitution will be made in 
uv. Therefore, in exactly the number of steps the algorithm takes to change u into 
R{u), it changes uv into R{u)v. This shows that R{u)v is an intermediate result of 
running the algorithm on uv. It follows that both must reduce to the same eventual 
result i.e., R{uv) = R{R{u)v). □ 

Proposition 2.2. Let R denote reduction with respect to a Cannon's algorithm 
{G,A,S) for G. Let x,y be words in G* such that R(x) = R{y). Then x and y 
represent the same element of G. 

Proof. If R{x) = R{y) then R{x)y^^ = R{y)y^^ from which it follows, by Lemma 



2.1 



that R{xy ^) equals the empty word. But since R comes from a Cannon's algo- 
rithm, this implies that x and y represent the same group element. □ 

This means that a Cannon's algorithm always remembers what element of the 
group it was fed. In a sense this tells us that R{x) is a kind of "canonical form" for 

x^g*. 



As we shall see. Proposition 2.2 does not hold for non-incremental Cannon's algo- 
rithms. The following proposition shows that the incremental rewriting algorithms 
form a subclass of the non-incremental ones. 

Proposition 2.3. Given rewriting rules (A, S) there is a set of rewriting rules 
(A, 5") such that the non-incremental rewriting algorithm of (A, S') carries out 
exactly the same substitutions as the incremental rewriting algorithm of{A,S). 



■^Examples of groups with context-sensitive word problem, but not growing context-sensitive 
word problem are given in 16 . In work in progress (joint with Derek Holt and Sarah Rees) 
we show that a language is growing constext-sensitive if and only if it is the language of a non- 
deterministic Cannon's algorithm. In addition, we show that the methods of Sections |6] and [7] 
extend to these non-deterministic Cannon's algorithms. This has additional language-theoretic 
consequences. 



6 



OLIVER GOODMAN AND MICHAEL SHAPIRO 



Proof. Suppose we carry out the non-incremental rewriting algorithm given by 
(A, S*). In what situation would it make a different substitution to that chosen 
by the incremental rewriting algorithm? Clearly only when we encounter nested 
left-hand sides in our word. In that case the non-incremental algorithm chooses the 
longer word because it starts first, whereas the incremental algorithm chooses the 
shorter because it ends first. But this means that the incremental rewriting algo- 
rithm will never actually invoke the rule with the longer left-hand side. Therefore 
we can discard from S any rules whose left-hand sides contain another left-hand 
side ending before the last letter. Call the set of rules we obtain S' . Using these 
rules both algorithms make exactly the same substitutions. □ 

2.1. Rewriting algorithms and compression. The key result underlying the 
group theoretic properties of Cannon's algorithms is that if a group has a Cannon's 
algorithm with respect to one (finite) set of generators, it has one with respect to 
any other. 

Let Q and Q' be sets of semi-group generators for G, such that {Q,A,S) is a 
Cannon's algorithm for G. Each element of G' can be expressed as a word in Q* . 
Let n be the length of the longest such word. Let A*" be the set of non-empty 
words of length at most n in A*. We can use it as an alphabet, each of whose 
letters encodes up to n letters of A. Since Q C A we can regard Q' as a subset of 
A*". 

The writing out map from (A*")* to A* maps a word to the concatenation 



of its letters. Lemma 2.4 shows that given {G,A,S) we can construct an algo- 
rithm {Q' , A*", S') which, by "mimicking" {Q, A, S), deletes its input precisely when 
{Q,A,S) deletes the written out version of the same input. Unfortunately the al- 
gorithm we give is not quite an incremental rewriting algorithm: its rules are not 
strictly length decreasing. The main point of this section is to explain how we 
can overcome this problem and give an incremental rewriting algorithm which does 
what we want. 

Lemma 2.4. Let (Ao,A, 5) be an incremental (or non-incremental) rewriting al- 
gorithm. Then for any integer n > there exists a non-strictly length decreas- 
ing incremental (resp. non-incremental) rewriting algorithm (A™, A*",S") with the 
following property. For each word w G (Aq")*, the reduction of w with respect 
to (A™, A*", 5') written out, equals the reduction with respect to {Aq,A,S) of w 
written out. 

Proof. Let W be the length of the longest left-hand side in S. 

For an incremental algorithm, the set of left-hand sides in S' is the set of all 
words of length less than or equal to W in (A*")*, with and without leading ~ 's, 
which, when written out, contain a left-hand side of S. For each such word, we 
write it out, apply one substitution from (Ao,A, 5), and write it back into (A*")* 
to obtain the corresponding right-hand side; an anchored rule can only be applied 
if the left-hand side starts with a ~. 

That this can be done without making the right-hand side any longer in (A*")* 
than the left-hand side should be clear: one case when the right-hand side cannot 
be any shorter is when the left-hand side is one letter long, and the substitution we 
make on the written out word does not entirely delete it. 

We have to check that, modulo writing out, the two algorithms carry out the 
same substitutions. Let w, written out, contain a left-hand side u of S. Some 
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subword of w, adorned with a ~ if it is an initial segment, contains u, and is a 
left-hand side in S' . The first S"-left-hand side can't end to the left of the end of 
u, since it would then contain no S'-left-hand side at all. Therefore the first S"-left- 
hand side contains u, and is anchored if u is an initial segment. The rule in S' for 
this left-hand side carries out the substitution in S for u. 

For a non-incremental algorithm, the set of left-hand sides in S' is the set of 
words U G (A*")* of length less than or equal to W, with optional leading and 
trailing " 's, such that 

(1) U written out contains a left-hand side of S, and 

(2) if the first S'-left-hand side in U starts fewer than W A-letters from the end 

of U, then U ends with a ~. 

Let w and u be as above. We can find an S"-left-hand side U inw which contains 
u. Now u could have a subword uq which is also a left-hand side in S. In principle, 
the first 5"-left-hand side in w might contain uq but not u, but this is ruled out by 
(2). Therefore the first S"-left-hand side in w contains u, and the corresponding 
rule does the right substitution. □ 

We want to adjust this basic construction so as to obtain rules which arc strictly 
length decreasing. If each rule in S were to delete at least n letters, there would be 
no problem, but this will not generally be the case. When the input word has two 
or more letters we might write the result of a single substitution as a shorter word 
in (A*'^"^^')*. This doesn't really solve the problem since we end up working in 
larger and larger alphabets. And what about a word of length 1 in (A*")* which 
when written out and reduced, is non-empty? The algorithm we construct will not 
touch such a word unless it can delete it entirely. In fact, unless it can delete its 
input completely, it may stop short with some intermediate result of the original 
algorithm. This is fine since we only really care whether or not an input word is 
deleted completely. 

We return first to the original algorithm (Aq, A, S), and try to see to what extent 
it can be made to remove several letters at a time when it substitutes. 

It is helpful to think of the algorithm as being carried out by a machine which 
views the word it is processing through a window of size W, where W is the length of 
the longest left-hand side in S. Since the incremental algorithm works by observing 
the earliest ending left hand side, one might imagine that the machine acts when a 
left hand side ends at the end of the window. Similarly, a non-incremental rewriting 
algorithm acts when a left-hand side starts at the start of the window. If there are 
no left-hand sides visible, the machine steps one letter to the right, or stops if it has 
reached the end of the word. If there is a left-hand side, it substitutes and steps 
W -1 letters to the left. 

Let w be a word containing a left-hand side and let u be the first such in w. Let 
us look at a subword U of w extending A>W letters to the left of u, and B >W 
letters to the right, and see what the machine does. The machine's actions are 
entirely determined by the contents of this A, B -neighborhood of u until such time 
as it needs to examine letters either to the left or to the right of it. We say that the 
machine goes to the left or to the right accordingly. In the first case the machine 
must first make at least Lw^J + 1 substitutions. We call each substitution made 
in this way a subword reduction. 
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If there are fewer than A letters to the left of u. U is an initial segment of w; 
then the machine's actions are determined by the contents of U until it (inevitably) 
goes to the right. If there are fewer than B letters to the right, the machine can 
either go to the left or terminate. 

We make rules which carry out several substitutions at a time. The new left-hand 
sides are the reducible words with no more than A letters before the first left-hand 
side, and no more than B letters after it. The new right-hand sides are the result 
of running the niac;liine on the left-hand sides until it goes to the left or the right. 
If the new left-hand side has fewer than A letters before its first S'-left-hand side, 
we allow the machine to run until it goes to the right and make the resulting rule 
be anchored at the start. If there axe fewer than B letters after the S'-left-hand 
side, we allow the machine to run until it goes to the left or terminates; for a non- 
incremental rewriting algorithm we make the resulting rule be end-anchored. We 
call the rules we obtain left-going if the machine went to the left and right-going if 
it went to the right or terminated. 

Finally, let us discard all right-going rules which have a left-hand side with fewer 
than B letters after the first S'-left-hand side, and non-empty right-hand side. Let 
S' contain all the remaining rules. We claim that as long as ^ > B + W and 
B > W —1 a, machine using the rules S' still carries out the same substitutions but 
may stop short of fully reducing the input word (with respect to S) . 

Let w he a, word containing an S'-left-hand side and let u be the first such in 
w. We have to show that if w contains an 5' -left-hand side then the first such 
contains u. An S'-left-hand side can't end to the left of the end of u since in the 
incremental case it would contain no S'-left-hand side, while in the non-incremental 
case it would have to be a non-end-anchored rule with fewer than B letters to the 
right of its first S'-left-hand side. Therefore if any S"-left-hand side contains u, the 
first one in w does. 

If we can find no S"-left-hand side containing u then the A, ^-neighborhood of u 
must be one of the deleted left-hand sides. In that c;ase u ends within B letters of 
the end of w. Since A > B + W, any other rule which might apply, containing some 
other S'-left-hand side ui to the right of u, would also see u, which is a contradiction. 
The fact that the rule for the A, ^-neighborhood of u has been deleted means that 
in this case the original algorithm would have terminated with a non-empty result. 

With w and u as above, we define the reduction point of an incremental rewriting 
algorithm to be the right-hand edge of u, while for a non-incremental rewriting 
algorithm it is the left-hand edge of u. Each rule is either, 

(1) left-going, deleting at least [pjA-fJ + 1 letters, 

(2) right-going, deleting the whole left-hand side, or 

(3) right-going, shifting the reduction point at least B — {W — 1) letters to the 
right, or out of the word entirely. 

Lemma 2.5. Let (Aq, A, S) be an incremental (or non-incremental) rewriting al- 
gorithm. Then for any integer n > there exists an incremental (resp. non- 
incremental) rewriting algorithm (Ag", A*^^"""'^-', S") with the following property. 
For each word w e (Ag")*, the reduction ofw, with respect to (Aq", A*(^"~^), S"), 
written out is an intermediate result of the reduction of w written out with respect 
to (Aq, A, S). It is empty if and only if the latter is also. 
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Proof. Let us first give names to parts of our new working alphabet. Let B be all 
words in A*^^""^) of length at most n, and let C be all longer words. Our new input 
alphabet is a subset of B, and an input word is a word in B*. 

At any given time during the running of our new algorithm the current word will 
satisfy the following conditions. No C-letters end (in the written out word) to the 
right of the reduction point. Any C-letters present will end at least (2n — 1)(W — 1) 
original letters apart, i.e. they will be relatively sparse. 

Wc shall give rules that, modulo writing out, c;arry out subword re(luc;tion looking 
at least A = 2nW + W original letters to the left of the first left-hand side and 
B = 2nW letters to the right. The left-hand sides are words in [BUC)* such that 

(1) each is ^-reducible when written out, 

(2) each has up to A + (2n — 2) original letters before the first original left-hand 
side and up to B -|- (n — 1) following it, 

(3) any C-letters present come before the reduction point and are sparse, as 
noted above, 

(4) if there are fewer than A original letters before the first original left-hand 
side, it starts with a and 

(5) (non-incremental case only) if there arc fewer than B original letters fol- 
lowing the first original left-hand side, it ends with a ~. 

Modulo writing out, these are the same left-hand sides as before except that we 
have to allow for the granularity of the B and C letters. 

To obtain each corresponding right-hand side we apply subword reduction to 
the written out word for 2n — 1 steps or until subword reduction is complete if this 
happens first: it follows that there will be no left-going rules. If 2n — 1 substitutions 
were made (or the left-hand side was deleted entirely) we can write the result using 
at least one fewer C-letters, or fewer S-letters if no C-letters were present. Since 
the reduction point moves at most (2n — l){yV — 1) original letters to the left, it 
moves past at most one C-lcttcr. Therefore wc can write our right-hand side so as 
to preserve the above conditions on the placement and sparsity of C-letters. 

If subword reduction is complete before 2n — 1 substitutions have been made, 
and the result is non-empty, it may be impossible to keep the number of C letters 
fixed and still write a length reducing rule. If this is the case, and there were fewer 
than B original letters after the original left-hand side, we discard the rule entirely. 
With _B > 2n — 1 or more original letters after the left-hand side, only reductions 
which remove fewer than n letters can force us to introduce a new C-letter. For an 
incremental rewriting algorithm the new reduction point will be to the right of our 
subword. By writing the new C-lcttcr at the end of the right-hand side wc ensure 
that it ends at least {2nW — (n — 1)) letters to the right of the previous reduction 
point. Since (2nW^— (n— 1)) > {2n — l){W — l) the sparsity of C-letters is preserved. 
For a non-incremental rewriting algorithm the now reduction point could be up to 

— 1 letters in from the end of our right-hand side. Thus our new C-letter might 
have to end uptoW^— l-|-n — 1 original letters from the end of the right-hand side. 
This still puts it at least {2nW - 2{n -\)-{W - 1)) > (2n -1){W - 1) letters to 
the right of the previous reduction point. 

The rules we have given are strictly length decreasing. They preserve the condi- 
tions given on the placement of C-lcttcrs. Modulo writing out and working several 
steps at a time, the rules apply the same substitutions as the original algorithm. If 
a word is reducible when written out, either a rule will apply, or the word will be 
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a few steps away from being reduced with a non-empty result. It follows that the 
new rules delete a word in (A™)* if and only if the original rules deleted the same 
word written out. □ 



2.2. Composition of non-incremental rewriting algorithms. Let us intro- 
duce the notion of a finite state Dehn machine. As with rewriting algorithms these 
can be either incremental or non-incremental. (We describe the non-incremental 
version: to obtain the incremental version, read "ending at the current position" 
wherever the definition says "starting at the current position.") Such a machine 
comes with a finite collection of states, Q — {qi}- One of these, go is the start 
state. For each state q £ Q there is a collection of length reducing replacement 
rules Sq = {ui — > Vi\. There is also a transition function which chooses a new 
state depending on the current state and the contents of the subword of length W 
starting at the current position, where W is an upper bound for the lengths of all 
the left-hand sides. 

Such a machine starts in the start state at the beginning of the input word. In 
state g, it looks at the next W letters for the longest left-hand side in Sq starting 
at the current position, and to determine its new state. It then either substitutes 
and steps W letters to the left, or steps one letter to the right. In either case it 
switches to the new state. It terminates when it reaches the end of the word with 
no further replacements possible. 

Observe that when a Dehn machine with state terminates it does not necessarily 
leave behind a word which is free of left-hand sides. While Dehn machines with 
state are ostensibly more powerful than rewriting algorithms, we show that, by 
storing the state information in the current word, we can get a rewriting algorithm 
to "mimic" a Dehn machine. We then use Dehn machines with state to show that 
non-incremental rewriting algorithms have a nice composition property. 

We can extend the concept of writing out to include any map A'* — > A* induced 
by a map A' A* . A machine stops short if it terminates at a point when all 
remaining substitutions would have applied to a final segment of bounded length. 
One machine mimics another if the result of the mimic written out is always a 
result of the original stopping short. 

Proposition 2.6. Given a non-incremental (or incremental) finite state Dehn ma- 
chine, there is a non-incremental rewriting algorithm (resp. incremental rewriting 
algorithm) which mimics it. The mimic terminates with an empty word if and only 
if the finite state Dehn machine terminates with an empty word in its start state. 

Proof. We give first a non-strictly length decreasing rewriting algorithm. At the end 



we sketch how the trick used in the proof of Lemma 2.5 of introducing widely spaced 
"multi-letter" letters allows us to give strictly length-decreasing rules. The reason 
we prefer to give a non-strictly length decreasing algorithm here is that, while the 
details of making strictly length-decreasing rules are not hard, they would obscure 
the basically simple idea behind this proof. 

Let A be the working alphabet of our Dehn machine. We make copies of A in 
different colors, one corresponding to each state of the machine, and one more in 
white. The input alphabet, and the copy of A corresponding to the start state, 
we color indigo. At any given time during the running of the mimic algorithm an 
initial segment (possibly empty) of the current word is white. The first colored letter 
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indicates a state of the Dehn machine and its current position, and the remaining 
letters are all indigo. 

Let W be the length of the longest left-hand side of the Dehn machine. We 
specify the substitutions we wish the mimic to make rather than giving the precise 
rules. Look W letters to either side of the first colored letter. If a substitution is 
indicated (according to the state of the first colored letter) we make it, color up 
to W — 1 letters indigo, and one the color of the new state. If no substitution is 
indicated, the first colored letter is turned white and the next letter is colored with 
the new state. A special case arises for rules which delete their whole left-hand side 
and do not lead to the start state. Since there is no suitable letter to color with 
the new state the mimic instead writes a colored blank. We then have to add a 
few more rules which take a colored blank followed by a letter and write the same 
letter in that color. 

It is not hard to see that the mimic and the Dehn machine make essentially 
the same substitutions. When the mimic terminates it is with a word that is white 
except for its final letter which indicates the termination state of the Dehn machine. 
If the Dehn machine terminates with an empty word, in a non-input state, the mimic 
leaves behind a single colored blank. 

To make these rules length decreasing we instead look 2W letters before and 
after the first colored letter in the incremental case {2W before and 3W after if non- 
incremental) . We run the Dehn machine as a subword reduction. If no substitutions 
are made, the first colored letter is shifted at least 214^ letters to the right and two 
white letters are replaced by one encoding them both. If subword reduction goes 
to the left, enough substitutions will be made to allow us to remove any "double" 
letters we find on the way (these ending at least 2W original letters apart). If we 
can't see 2W (resp. 3W) letters to the right we may have to discard the relevant 
rule and allow the mimic machine to terminate a little prematurely. This only 
happens in cases where the machine is unable to make any further substitutions 
between the current point and the end of the word. □ 

Let (A, S) and (A, S") be non-incremental rewriting algorithms with reduction 
maps P and P' respectively. Ideally there would then be a non-incremental rewrit- 
ing algorithm with reduction map P' o P. Unfortunately this doesn't appear quite 
to be the case. We have to allow the resulting algorithm to give its answer in some 
"compression alphabet" A*", and we may have to allow it to stop short of reaching 
its answer. We don't really mind the compression alphabet, but having a machine 
stop short is a problem: it gets in the way of doing any further composition. 

Reluctantly, we must add a further "flavor" of non-incremental rewriting algo- 
rithm to our collection. An nearly strict non-incremental rewriting algorithm is one 
which may include some length preserving ending rules: these are end anchored 
rules such that the resulting algorithm has the property that one of these will ap- 
ply only when the word is reduced with respect to all the strictly length decreasing 
rules, and afterwards the word will be fully reduced. We shall not consider here 
the question of how to determine, in general, whether a given set of rules has this 



property. What is hopefully clear is that if, in the proof of Lemma 2.5 we put 
back the deleted rules, we obtain a nearly strict non-incremental rewriting algo- 
rithm. Modulo writing out, the resulting algorithm achieves the reduction map 
of the original algorithm. Furthermore it makes no difference to the proof if the 
original algorithm is itself nearly strict . 
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Lemma 2.7. Let (Aq, A, S) be a nearly strict non-incremental rewriting algorithm. 
Then for any integer n > there exists a nearly strict non-incremental rewriting al- 
gorithm (Aq", A*''^"""'^', 5") with the following property. For each word w € (AJ")*, 
the reduction ofw, with respect to (Aq", A*'^"^"'^^ S"), written out is the reduction 
of w written out with respect to (Ao,A, 5). □ 

Similarly, when we construct a mimic for a non-incremental Dehn machine with 
state, we can avoid stopping prematurely by allowing ending rules for the resulting 
machine. We can also allow a Dehn machine with state to have ending rules. These 
are length preserving rules which put it into a terminal state, a state without rules 
which the machine cannot leave. Such a machine can also be mimicked by a nearly 
strict non-incremental rewriting algorithm, the proof being virtually unchanged. 

We shall show that it is possible to compose nearly strict non-incremental rewrit- 
ing algorithms. We can always recover a genuine non-incremental rewriting algo- 
rithm which might stop short, by discarding the length preserving rules. 

Proposition 2.8. Let (A, S) and (A, 5') be nearly strict non-incremental rewriting 
algorithms with reduction maps P and P' respectively. There is a nearly strict non- 
incremental rewriting algorithm (A, A', T) which mimics the process of first applying 
(A, 5*) and then applying (A, S"). The reduction map of {A, A\T) written out is the 
composition P' o P. 

Proof. This process can be carried out by a finite state Dehn machine. In its initial 
state it applies the rules in S. Once no more rules apply and it approaches the end 
of the word, it switches to a second state. In this state it simply compresses a little 
bit until it arrives at the start of the word again. Then it switches into a third state 



where it uses the rules in S' modified, as in Lemma 2.7 for compressed input. 

What if S includes ending rules? Without loss of generality, ^-left-hand sides 
are either W > 3 letters long, or anchored at both ends. Ending rules of length W 
can be combined with compression. Rules anchored at both ends can be modified 
so as to complete the entire reduction {P' o P) at a single step. 

With S as above, our machine can recognize the end of a reduced word by 
finding any word of W — 1 letters which is anchored at the end but not the start. 
(Shorter entire words being already dealt with.) It can then start backtracking and 
compressing. 

When the modified S' has ending rules, these become ending rules for the finite 
state machine. Finally, we transform the resulting Dehn machine with state into a 
nearly strict non-incremental rewriting algorithm. □ 



2.3. Group theoretic consequences. From Lemma 2.5 and the discussion at the 
start of this section we have the following result. 

Theorem 2.9. Let G be a group with finite semi-group generating sets Q and Q' . 
Then G has a Cannon's algorithm with respect to Q if and only if it has one with 
respect to Q' . □ 

Theorem 2.10. Let G be a group and let H be a finitley generated subgroup of G. 
Lf G has a Cannon's algorithm, H has one too. 



Proof. Choose a set of generators for G which includes generators for H. With 
respect to these generators, a Cannon's algorithm for G is also one for H. □ 
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Theorem 2.11. Let G be a group and let H be a finite index subgroup of G. If H 
has a Cannon's algorithm, G also has one. 

Proof. Fix a transversal T for [H : G] from which wc omit the representative of the 
identity coset. Fix a finite generating set Q for G containing T. Each word 51(7253 
with Qi G G is equal in G to a word of the form [h] [t] , for some h G H and t £ T , 
where the brackets indicate that each letter may be omitted. If 5152 S Q* evaluates 
to an element of H, it can be written as the or 1-letter word [h], again for some 
h € H. As 51,52,53 vary in Q we obtain finitely many elements h £ H. Let Ti. be 
a finite generating set for H containing all non-identity elements obtained in this 
way and also, all of t/ n 77. 

The above equalities give rules R of the form 515253 ^ ht etc. We omit any 
rules with 51 G Ti. The incremental rewriting algorithm {Q, Q UTi, R) turns a word 
in Q* into a word in 7i* followed by at most two letters from Q by pushing a coset 
representitive along the word. If an input word to this algorithm represents an 
element of H, the reduced word will be in H* . 

Let {n, A, S) be a Cannon's algorithm for H. We claim that {g,gunuA, RUS) 
is a Cannon's algorithm for G. The R rules translate the word into a word in H 
followed by a couple of letters keeping track of the coset. Then the S rules chase 
along behind applying H's Cannon's algorithm to the word in Ti. The effect is 
exactly as if we applied {Q, Q UH, R) first, followed by applying {Ti., A, S) to the 
Ti.* part of the result. If an input word represents the identity in G, the first 
step produces a representation of the identity in Ti.* and the second deletes it. 
An input word which does not represent the identity will reduce, either to some 
word containing letters in Q — Ti, if it does not evaluate into H, or otherwise to a 
non-empty word in (Ti U A)*. □ 

The previous theorems hold both for both Cannon's algorithms and non-incremental 
Cannon's algorithms. The last of these suggests a way to construct a Cannon's 
algorithm using the non-incremental rewriting algorithm which does not satisfy 
Proposition [2T2] Consider the case of H finite index in G. It is not hard to parlay a 
Cannon's algorithm for H into a non-incremental rewriting algorithm which solves 
the word problem in G but destroys information in the case where the word is not 
in the identity coset. Here is what it does: given a word w, it first transforms this 
into a word of the form ht where ft, is a word in the generators for H (possibly the 
empty word) and t is an element of the transversal, and is empty if and only if it 
represents the identity coset. If t is empty, we now proceed to reduce h according 
to the Cannon's algorithm for H. On the other hand, if t is not empty, we can 
proceed to wantonly destroy the information in h. 

Proposition 2.12. There is a non-incremental rewriting algorithm which is not 
mimicked by any incremental rewriting algorithm. □ 

Theorem 2.13. // G and H both have Cannon's algorithms, then so does their 
free product G * H . 

Proof. We suppose that Gq and Gi are groups with Cannon's algorithms {Qo,Ao, Sq) 
and (t/i, Aij^i) respectively and that the alphabets for these are disjoint. Let 

Tq = {au — > aw I M — !■ u e S'o, a G Ai} 
Ti — {au av {"u V £ Si, a £ Aq} 
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5 = 5o U To U 5i U Ti 
A = Ao U Ai. 

We claim that {Q, A, S) is a Cannon's algorithm for Gq* Gi. 

To see this, consider a word xq . . . Xn consisting of alternating non-empty words 
from the alphabets Qq and Gi. For simplicity, we will assume that we have numbered 
the two groups so that Xi S Gi(mod 2)- We claim that as long as no Xi evaluates 
to the identity, R{xo ■ ■ ■ Xn) = Ro{xo) ■ ■ ■ Rn{Xn)- (Here we are using R to denote 
reduction with respect to S and i?, to denote reduction with respect to ^^(inod 2)- 
Likewise, wc will refer to S^jj^^d 2) ^ 'Si and Ti^inod 2) '''■^ Tj.) 

This claim is true when n = 0, for then only the rules of Sq apply. Sup- 
pose now that this claim holds for n = k. We wish to establish it for the case 
n = k + 1. By induction an intermediate result of the reduction of a;o • ■ • -I'k+i is 
if!o(a;o) • ■ . Rk{xk)xk+i and the portion before Xk+i is fully reduced. Further, the 
assumption that no Xj evaluates to the identity implies that Rk{xk) is non-empty. 
Accordingly any further reductions are made cither by a a non-anc;liorc(i rule of 
5^+1 applying entirely inside Xk+i or by a rule of Tfe+i applying at the last letter of 
Rk{xk) and the beginning of Xk+i- Any rule of Tk+i changes only the letters of Xk+i 
and performs exac;tly as an anchored rule of Sk+i would have done had Xk+i been 
the beginning of a word. These combine to produce Ro{xo) . . . Rk{xk)Rk+i{xk+i) 
as required. 

In particular if no Xi represents the identity, then xq . . .Xn does not represent 
the identity and does not reduce to the empty word. 

Now consider the case in which some Xi represents the identity. We take Xi 
to be the earliest such. The process of reducing the word w = xq . . . x,, pro- 
duces Ro{xo) . . . Ri-i{xi-i)xi . . • clS clll intermediate result. As before, Si and 
Tj conspire to reduce Xi as Si would have done had Xi stood alone. This produces 
Ro{xo) ■ ■ ■ Ri-l{xi-l)xi-^-l . . . Xn- But this is an intermediate result of reducing 
w' = xo ■ ■ ■ Xi-iXi+i . . . Xn- Furthermore, w' represents the identity if and only if 
w represents the identity. But the free product length of w' is two less than the 
free product length of w. Thus we may assume inductively that w' reduces to the 
empty word if and only if it represented the identity and we conclude the same 
about w- 

Since this induction reduces free product length by two, it remains to check two 
base cases. One is when the free product length of w is 0, and here there is nothing 
to check. The second is when the free product length is 1. This is just application 
of the Cannon's algorithm in one of the factor groups. □ 

We do not know how to prove this for non-incremental Cannon's algorithms. 
This raises the following 

Question 2.14. Are there groups with non-incremental Cannon's algorithms which 

do not have Cannon's algorithms? 

3. Groups with Expanding Endomorphism 

Let G be a finitely generated group with finite set of semi- group generators Q. Let 
denote the word metric on G with respect to Q- We say that a homomorphism 
1^ : G — > G is an expanding endomorphism if v(G) is a finite index subgroup of G 
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and there exists a constant M > 1 such that tg{f{g)) > M£g{g) for all g E G. 
Observe that by taking a suitable power of ip we may make M as large as we wish. 
By taking a finite set of coset representatives for (p{G)\G we see that there is a 
constant K such that for all g & G, the distance from g to (p{G) is at most K. We 
say that (p{G) is K-dense in G. 

Let A be the finite alphabet t/U{t, t~^}, where t and are letters not in Q. We 
say that a word w in A is balanced (with respect to t) if w has the same number of 
t's as i~^'s, and further, every initial segment of w has at least as many t's as t^^'s. 
Each balanced word w in A represents an element of G: we define the element 
represented by twt~^ to be Lp applied to the element represented by w. 

The following rules (assuming ip is chosen so that both M and K are sufficiently 
large) give a Cannon's algorithm for G. In the rules: g denotes a word in Q* , and 
g' and g" denote geodesic words in Q* such that g — ip{g')g", and £{g") equals the 
distance from g to ^{G). 

(1) Replace any non-geodesic word g of length £(g) < 2K by an equivalent 
geodesic word. 

(2) If g is geodesic, with £{q) = 2K, replace g by tg't^^g" , or replace t~^g by 

g't-'g". 

(3) If g is geodesic, with i{g) < 2K and £{g") — (i.e. g G (p{G)), replace t ^g 
hy g't-\ _ 

(4) Replace tt^^ by the empty word. 

These rules clearly map balanced words to balanced words, and do not change 
the element of G represented. It is clear that Rules 1, 3 and 4 are strictly length 
decreasing. For Rule 2 to reduce length we need £{g') + i{g") + 2 < £{g). We have 
iig) = 2K, £{g") < K, and £{g') < jj{i{g) + ({g")). It follows that Rule 2 will be 
length decreasing if 3/M + 2/K < 1. 

Lemma 3.1. Let G, A = QU{t,t^^}, M and K be as above. Letw be the reduction 
of a word in Q with respect to Rules 1-4-. Then w has the form t"gnt^^ . . . t^^git^^go, 
or just go (n — 0), such that: 

(1) each gi is a geodesic word in Q of length less than 2K; 

(2) each gi, for i < n, is either in G ~ fiG) or it is empty; 

(3) if n > 0, gn is not empty. 

Proof. We show first that all t's appear at the start of w. Initially this is vacuously 
true. The only rule whose application could make this untrue is 2 since it is the 
only rule which creates t's. But Rule 2 is only applied at the start of the word, or 
when the immediately preceding letter is t, for otherwise one of Rules 1-3 would 
apply at least one letter to the left. 

Rule 1 ensures that each gi is geodesic, while Rule 2 ensures that the length of 
each gi is less than 2K. Rule 3 ensures that each gi, for i < n, is either in G~ip{G) 
or it is empty. Rule 4 ensures that (?„ is not the empty word if n > 0. □ 

Theorem 3.2. Rules 1-4 reduce each word in Q to the empty word if and only if 
that word represents the identity element of G. 

Proof. Let g — t"gnt^^ . . . t^^git~^go be the reduction of a word in Q representing 
the identity in G. Let i be the least integer such that gi is non-trivial. Then if 
implies that g belongs to a non-1 coset of (p'^^^{G). Therefore 



i < n, 2 in Lemma 



3.1 



is trivial for i < n. 
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Hence gn represents the identity in G. By 1 , f;„ is geodesic and therefore trivial. 
By 3, n = and so g itself is trivial. The converse is clear. □ 

The process we have just described is essentially that of writing the decimal 
expansion of a number. Indeed, if you apply this to the sum of 572 I's, 1 + • • • + 1, 
using the endomorphism n i— > lOn you will get t^ht~^7t~^2. This is nothing but 
the decimal 572 with t's performing the function of place notation. Unfortunately, 
our decimal expansions can be a bit perverse. In addition to the numerals for the 
numbers through 9, we also have numerals for the numbers —1 through —9. Let 
us give these the numerals 1 through 9. If you count up to 1,000,000 and then 
count back down to 1, you will wind up writing 1 as t^lt~'^Qt~^%t~'^%t~^9t~^%t^^% . 
i.e., as 1999999. Evidently, we can write an arbitrarily long word for the number 1. 

We say that a Cannon's algorithm is finite to one if as x varies over all words 
representing a fixed element of G, R{x) takes only finitely many values. 

Remark 3.3. There are Cannon's algorithms which are not finite to one off the 
identity. 

For the purposes of Section [5] we would like to modify our Cannon's algorithm 
to avoid this behavior. 

Given a reduced word w = t"'gnt~^g„-i . . .git^^go, we call n the height of w. 
Choose a positive integer N such that (M/3)^ > K. We add the following addi- 
tional rules to our system. 

(5) If is a reduced word, as above, with height at most N, such that £g{w) < 
^£{go), replace w by an equivalent geodesic word in Q* . 

Since there are only finitely many reduced words of height at most N, this 
introduces only finitely many rules. 

Lemma 3.4. Let w be the reduction of a word in Q* with respect to rules 1-5. If 
the height of w is n, and w does not represent the identity, then ig{w) > (M/3)". 

Proof. For height n — the lemma is clear. For n > we can write w — tw't~^gQ, 
where w' is reduced, of height n — 1, and not the identity, and go is geodesic. For 
n < N, Rule 5 ensures that £g{w) > ^i{go)- It follows that 3£g{w) > ig{w) + 
(■{go) > ig{tw't~^) > M£g{w'). By induction the lemma holds for all n< N. 

For n > N, writing w as before, ig{w) > M£g(w') — £{go)- By induction, 
ig{w') > (Af/3)"~^ Also ((go) < 2K which, by our choice of N, is less than 
2(M/3)"-i. Therefore £g{w) > {M - 2)(Af/3)"-i > (M/3)" since M > 3. □ 

Corollary 3.5. If G admits an expanding endomorphism then G is virtually nilpo- 
tent. 

Proof. Each element g € G can be represented by a word w whose length is bounded 
by klii(£g(g) + 1), for some fc > 0. Since there are only polynomially many such 
words, G has polynomial growth and hence is virtually nilpotent. □ 

It is apparently unknown whether all torsion free nilpotent groups have expand- 
ing endomorphisms. However, we will see in the next section that they all have 
Cannon's algorithms. 

Theorem 3.6. // G has an expanding endomorphism, then G has a finite to one 
Cannon's algorithm. 
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Proof. As in Corollary 3.5 the length ig{g) of an element g £ G gives a bound for 
the maximum length of any reduced normal form representing g. Therefore there 
are at most finitely many possible reduced normal forms for each element. □ 

Remark 3.7. The results of this section remain valid under the weaker hypothesis 
that G has a finite index subgroup H which admits an expanding endomorphism 
(p with respect to ig. The only change that needs to be made is to replace <f{G) 
with (p{H) throughout. 

This has the following corollary which we will need in our work on geometrically 
finite groups. 

Corollary 3.8. Let G be finitely generated and suppose that G has a finite index 
subgroup which has an expanding endomorphism. Let Q be a set of semi-group 
generators for G. Then for any N > there exists a Cannon's algorithm as above, 
with working alphabet A = Q U {t,t~^}, such that any normal form word w with 
ig{w) < N is a geodesic word in Q* . In particular, this holds when G is finitely 
generated and virtually abelian. 

Proof. Let _ff be a finite index subgroup with expanding endomorphism. (In the 
virtually abelian case, _ff is a finite index free abelian subgroup.) Raising to a 
sufficient power furnishes us with an expanding endomorphism of H , with expansion 



factor M such that Af/3 > N. By Lemma 3.4 any normal form word w with 



ig{w) < has height 0. □ 
We will call such a Cannon's algorithm N -geodesic. 

We will say that a rule it ^ w is a local geodesic rule if both u and v are words in 
the group generators and w is a geodesic. We will say that an iV-geodesic Cannon's 
algorithm {Q , A, S) is N -tight ii {Q, A, SUR) is also a A^-geodesic Cannon's algorithm 
whenever i? is a finite set of local geodesic rules and the left hand sides of S and R 
are disjoint. 

We record here the following observation. 



Proposition 3.9. The N -geodesic Cannon's algorithms of Corollary \3.S\ are N- 
tight. 

Proof. In this case, the rules of S determine that any sufficiently long geodesic g is 
replaced with a word tg't^^g" where g and g' are shorter geodesies. On the other 
hand, S replaces any non-geodesic shorter than this with a geodesic. In particular, 
no rule of R is ever applied. □ 

4. NiLPOTENT Groups 

In this section we shall show that every finitely generated, torsion free nilpotent 
group embeds in a group which has an expanding endomorphism. It follows from 
Theorem |3.6| and Theorem |2.10| that every torsion free nilpotent group has a Can- 
non's algorithm. Since every finitely generated nilpotent group is virtually torsion 



free, it follows by Theorem 2.11 that every finitely generated virtually nilpotent 
group has a Cannon's algorithm. 

We start with the group of n x n upper triangular matrices with I's on the 
diagonal. Those with integer entries we denote by t/„(Z), those with real entries 
we denote by J7n(K). 
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For each i^eR define : t/„(M) -> ?7„(M) as follows. If a; = (a;^) e ?7„(M), set 



It is not hard to see that is a homomorphism and that if ^ € Z then : C/„(Z) ^ 

C/n(Z). 

Let us fix a generating set ^ for I/„(Z) and endow J7„(R) with a left invariant 
metric. 



Lemma 4.1. The action ofUn{'^) on Un 
point free. 



is co-compact by isometrics and fixed 



Proof. We wish to see that each x = (xij) G Un{R), is a bounded distance away 



from some z = (zij) € C/„(Z). For p = 1, . . . n ^ p, let m(.xi, 



2-p) be the 



upper triangular matrix with I's on the diagonal, xi, . . . , Xn-p located distance p 
above the diagonal, and O's everywhere else. Notice that multiplying x = {xij) by 
such an nip leaves unchanged the entries of x below the p^^ off-diagonal and adds 
xi, . . . , Xn-p to the entries of x on the p*'' off-diagonal. Consequently, we can choose 
mi, m2, . . . , m„_i each with entries between and 1 so that z = xm\m2 . ■ ■ m„_i e 
UnC^)- Since the entries of each rrii arc bounded in size, so is their product. Hence 
z is a bounded distance away from x as required. □ 

Consequently, 

Lemma 4.2. There is X = Xg so that the embedding of the Cayley graph Tg{Un{1')) 
into J7n(M) is a (A, 0) quasi-isometry. 

Proof. It is a standard result that a co-compact discrete isometric action on a 
geodesic metric space induces a (A, e)-quasi-isometry. It is not hard to see that in 
the case of a fixed point free action, we may take e = 0. □ 

Lemma 4.3. For fi > 1, the map is a ^-expanding endomorphism on ?7„(]R). 

That is, for x,y e ?7„(ffi), d{f ^_,{x) , f ^_,{y)) > ^d{x,y). 

Proof. It suffices to show that f^ is everywhere infinitesimally /U-expanding. For 
X e Z7„(]R), a tangent vector at X is given by ^[X + At) where A = (aij) with 
aij = for all j < i. Without loss of generality we may assume |1 + At)\\ = 
(E<)^/^ =: n{A). Then by left invariance and linearity of matrix multiplication 



= n{X-'^A). 



Clearly extends linearly to all upper-triangular matrices and we have 



j,MX + At) 



j^{fM) + fM)i) 
= n{f^iX-'A)) > finiX-'A). 



□ 



Lemma 4.4. For fj, gZ, f^iUni^)) is finite index in ?7„(Z). 

Proof. The proof is the same as the proof that Uni^) is co-compact in f/„( 



□ 



Consequently, 
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Lemma 4.5. If ^ > X^, and /j, G Z then is an expanding endomorphism of 



Hence, by Theorem |3.6| 

Lemma 4.6. Un{'^) has a finite to one Cannon's algorithm. □ 

Now it is a theorem (see [2D], Chapter 5) that 

Theorem 4.7. If G is a finitely generated, torsion free nilpotent group then G 
embeds in UnCZ), for some n > 0. □ 

Hence, by Theorem |2 . 1 0| and Theorem |2.11[ 

Theorem 4.8. // G is finitely generated and virtually nilpotent, then G has a 
Cannon's algorithm. □ 

5. Relatively hyperbolic groups 

In this section we prove a theorem concerning Cannon's algorithms for (strongly) 
relatively hyperbolic groups. We first proved this in the context of geometrically 
finite hyperbolic groups and these are the parade examaple of relatively hyperbolic 
groups. The statement and proof here are close parallels of the geometrically finite 
case. 

There are multiple equivalent definitions of what it means for a group to be 
(strongly) hyperbolic relative to a collection of subgroups {Vi, . . . ,Vk}- These 
are equivalent to Farb's [T2] definition of relative hyperbolicity together with his 
bounded coset penetration property. Usage of the term relatively hyperbolic varies 
slightly in that it is often possible to drop the requirement that the subgroups be 
finitely generated. In our usage these will all be finitely generated. 

The key geometric result is the relation the geodesies and horoballs of the neg- 
atively curved space to the geodesies of subspace upon which the group acts co- 



comactly. This is Lemma 5.7 here, the Morse lemma. Proposition 8.28 of 



Theorem 5.1. Suppose that G is hyperbolic relative toV = {Pi, . ■ ■ ,-Pfe}- Suppose 
also that for each i, I < i < k and any N , Pi has a Cannon's algorithm with is 
N-tight. Then G has a Cannon's algorithm. This Cannon's algorithm consists of 
local geodesic rules together with Cannon's algorithms for the Pi. 

Corollary 5.2. If G is a geometrically finite hyperbolic group, then G has a Can- 
non's algorithm. 

Corollary 5.3. // AI is a graph manifold each of whose pieces is hyperbolic then 
t:i{M) has a Cannon's algorithm. 

Corollary 5.4. Suppose that M is a finite volume negatively curved manifold with 
curvature bounded below and bounded away from zero. Then tti{M) has a Cannon's 
algorithm. 

Corollary 5.5. Suppose that A and B are groups with N-tight Cannon's algorithms 
and that C is a finite group which includes as a subgroup of each of these. Then 
A*C B has a Cannon's algorithm. 



Corollaries |5.2| and |5.3| follow directly from Theorem |5.1| since the groups in 
question are hyperbolic relative to abelian (or virtually abelian) groups. Corol- 
lary [53] follows since the amalgam is hyperbolic relative to its factors. In the case 
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of Corollary 5.4 the groups are hyperbolic relative to nilpotent groups [12 . Nilpo- 
tent groups have Cannon's algorithms by Theorem |4.8| but there is no guarantee 
that these are A'^-tight for arbitrary N. It is only in the perhaps larger group of 
upper triangular matrices where this is guaranteed. However, once we have proved 



Theorem 5.1 we will see how to proceed here. 

We suppose that G is hyperbolic relative to a finite collection of subgroups 
{Pi, . . . , Pfc}. The parabolic subgroups of G are the G-conjugates of {Pi, . . . , P^}. 
We take V to be the set parabolic subgroups. The following are well known prop- 
erties of relatively hyperbolic groups. See, for example, [3], [TO] and |TT]. 

Basic properties 5.6. 

(1) G acts discretely by isometrics on a 6 -hyperbolic space H. 

(2) This action induces an action on the boundary i9H. 

(3) There is a G equivariant family of horoballs {Bp \ P G V}. 

(4) For each P & V we take Sp to be dBp. P acts co-compactly on Sp. 

(5) G acts co-compactly on X = ti \ (Up^-pBp). 

(6) Each horoball Bp is quasiconvex. Consequently, there is a rectraction rp : 
X —>■ Sp which is inherited from the hyperbolic retraction of H onto Sp. 
(We will also refer to this retraction as rs where S is the boundary of P. 

(7) For points sufficiently distant from Sp, the retraction rp shrinks X dis- 
tance by a super-linear factor. That is to say, there is a function s(-) 
with the property that for any linear function y{x) = mx + b, there is 

such that for x > xq, s{x) > y{x) and there is do so that if d ^ 
m.in{dx{p,Sp),dx{q,Sp)) > do then 

dx{rp{p),rp{q))<^-^^. 

(8) There is 6 with the following property. Suppose that Sq and Si are disjoint 
horospheres, i.e., the boundaries of disjoint horoballs in V. Suppose that 7 
and 7' are H geodesies that start in Sq and end in Si and that x and x' 
are the last points of j and 7' in Sq. Then dx{x,x') < 6. 

(9) There is S so that if So and Si are disjoint horospheres, then rso{Si) has 
dx diameter bounded by S. 

(10) Given d there is e with the following property. Suppose S is the boundary 
of horoball B. If ^ is an H geodesic that starts and ends on S then the 
only portion of 7 lying in the 6 neighborhood 0/ H \ P are an initial and 
terminal segment ofj, each of length at most e. 

□ 

We need the following lemma which is Proposition 8.28 of [ITI. 

Lemma 5.7. There is 5 depending only on A and e with the following property. 
Suppose that w is a (A, e) quasigeodesic in X and 7 is a H geodesic with the same 
endpoints. Suppose that E = "^{l) the union of 7 and the horospheres that it 
meets. Then w lies in a S neighborhood ofY,. □ 

Given an H geodesic, 7, it meets a finite (possibly empty) collection of horoballs, 
Bi, . . . , Bk. Replace each portion j O Bi with an X geodesic, ai to produce the X 
paths 

a = 7o(Ti7i . . .cr„7„. 
We refer to a path formed in this way as a rough geodesic. 
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Lemma 5.8. There is a A such that every rough geodesic is an X X quasigeodesic. 

Proof. Suppose that 7 is an geodesic and a — 7017171 . . . cr„7„ is a corresponding 
rough geodesic. Suppose that cr' is a corresponding X geodesic. By LemmajSj] this 
Hes in a (5 neighborhood of S = ^(7). Let us decompose a' as a' — j'ocr'i'yi ■ ■ ■ cr'ni'n 
where is the portion of a' which hes within 6 of the horosphere for ai , but not 
within 6 of 7. Some of these may be empty. However, it follows that for each i, 7^ 
and 7^ lie within 26 of each other. Since each of these is geodesic, the difference 
in their lengths is bounded. Similarly, for each i, the endpoints of ai and a[ are 
close to each other, thus bounding the difference in their lengths. Accordingly, 
the difference in lengths along a and cr' arise only from these breakpoints each of 
which contributes only a bounded difference. Since there is a mimimum distance 
between horospheres, these breakpoints are bounded away from each other. The 
result follows. □ 

We record here two general properties of (5-hyperbolic spaces. (Here we use the 
parameterized version of (5-hyperbolicity.) 

Proposition 5.9. Given 6' > 6, there are (A,e) with the following property. Sup- 
pose ^ is a piecewise geodesic. Suppose that each segment of 7 has length at least 
5' -\- 1, and that at each bend, both segments depart a S neighborhood of each other 
after travelling at most distance 6' from that bend. Then j is a (A, e) quasigeodesic. 
□ 

Proposition 5.10. Suppose that (Ai,ei) and (A2,e2) given. Then there is 
(A3, £3) with the following property. If a is a (Ai,ei) quasigeodesic and t is formed 
from a by replacing disjoint subpaths with (A2,e2) quasigeodesics, the t is a (A3,e3) 
quasigeodesic. □ 

Suppose G is hyperbolic relative to Pi , . . . , P^. . We would like to find a generating 
set Q in which Pi, . . . ,Pfc are convex in the Cayley graph of G. Given an set of 
generators Q' for G and A' > 0, set 

A^{K) = {geP^\dx{l,9)<K} 

and 

g=.g{K) = g'yj (^mk)^ . 

Lemma 5.11. Given K sufficiently large, Q has the following properties: 

(1) There are constants A and B with the following properties: Suppose w is 
a Q -geodesic. Let Sp be a horosphere with P conjugate to Pi. Suppose w 
begins and ends at X distance at most d from Sp. Then w = xyz, where 
£{x) <Ad+B, l{z) <Ad+B, and y e {Ai{K))* . 

(2) If w begins on Sp, x is empty. If w ends on Sp, z is empty. In particular, 
a Q -geodesic evaluating into Pi, is written in letters all of which lie in Pi. 

(3) If we fix d then if w is sufficiently long, y is non-empty. 

Proof. We claim that there is a bound r independent of K so that if e is a G{K) 
edge which does not lie in P, then the the X length of rp(e) is less than r. If e is 
an Ai{K) edge which does not lie in P, then it lies within a bounded distance of 
some horosphere other than Sp. By property [9] of Proposition 5.6 rsp{Sp') has 
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bounded diameter. There are only finitely many Q' letters and their edges also have 
bounded retractions onto Sp. This gives the bound r. 

Now consider the case of a geodesic w which begins and ends in P. We wish to 
show that all edges of w lie in P. If this fails, we replace w with a sub-segment 
whose only contact with P are its two endpoints, p and q. Notice that it must 
therefore have length at least 2 since it leaves and returns to Sp. Let di be the 
maximum distance from Sp to P. Then the path rp{w) starts and ends within 
distance di of w. Thus £{w) > fMPiihi^^ Now consider an X-geodesic from p to 
q. This has length at most dxip, q) + 2di and each point of it lies within distance di 
of P. It follows that the Ai{K) distance between p and q is at most dxM+2d, ^^^ 
Choosing K sufficiently large contradicts the assumption that w was geodesic. 

Now consider the case in which p and q do not necessarily lie on Sp. Let p" and 
q" be their respective projections onto Sp and p' and q' be points of P near these. 
There are A and e depending on K so that the embedding of G into X is a (A, e) 
quasi-isometry. Consider tuv with t a geodesic from p to p' , u a, geodesic from p' 
to q' and V a geodesic from q' to q. Then 

i{w) < e{tuv) < 2Xd + 2e+ dx{p'.q')+2d, ^ ^ 

K — di 

Now if y does not appear in w, i.e., w contains no subword lying in P, then 

dx(p',q') -2di , , „^ , „ dx(p',q') -2di , 

' ^ ^ + 1< e(w) < 2\d + 2e + ^ ^ + 1. 

r - V / - K + di 

The value of A can only decrease as K increases, since A measures how many 
G{K) letters it takes to travel a certain distance in X, and for K sufficiently large, 
K — di > r. Thus, for any sufficiently large K, there is a linear bound £{w) < 
A'd + B' on those w for which y is empty. 

We now suppose w = xyz where y is the maximal portion of w lying in P and 
is non-empty. Let p'" and q'" be the endpoints of y. We claim that these must 
lie a bounded distance from p' and q' . To see this, notice that w is an X quasi- 



geodesic. It follows from Lemma 5.7 that w fellow travels the its X geodesic union 
the horosphere's that these meet. It is not hard to see that if dx{p, q) is sufficiently 
large, this X geodesic meets Sp near p" and q" . □ 

For i j, Pi and Pj meet in a finite (perhaps trivial) subgroup. We will assume 
that K is chosen large enough so that any non-trivial elements common to one or 
more subgroups appear as generators. After choosing K, we will refer to G{K) and 
Ai{K) as G and Ai. 

We are now in a position to describe the Cannon's algorithm of Theorem |5.1[ 
This depends on constants D and E. For each i, let {Ai,Ai, Si) be a ZJ-tight Can- 
non's algorithm for Pi. We will assume that any rules operating inside a common 
subgroup rewrite immediately to a single letter and thus, these rules agree between 
the different S^. We take Sg to be a collection of local geodesic rules which contain 
a left-hand side for each G word which is not a geodesic. We assume that these 
agree with any rules which also appear in some Si. We will assume D > E. We 
take A = U (U^A^) and S ^ Sg U (UiSi). We will show that with E sufficiently 
large, V = (G, A, S) is a Cannon's algorithm. This requires a series of lemmas. 

We first check that the parabolic subgroup sub-Cannon's algorithms are still 
effectively D-tight within V. 
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Lemma 5.12. Suppose that w is the result ofD reducing a Q input word and that u 
is a maximal Ki subword ofw. Then u is a reduced word for a D-geodesic Cannon's 
algorithm for Pi . 

Proof. Consider the process by which u is produced. Since the are disjoint, for 
j ^ z, no Sj rule can apply in the production of u. Consequently the formation of u 
is carried out by 5'^ rules and Sg rules. Notice that any non-^j input letters which 
are consumed in the production of u must first be turned into Ai letters prior to 
their consumption by Si. This is done by Sg rules shortening non-geodesics into 
geodesies which must be in Ai letters. Therefore, u could have been produced by 
applying the Sg rules and 5*^ rules to an input word in A*. The result now follows 
from the assuption that Cannon's algorithm for Pi is tight. □ 

It now follows that if w is the result of P-reducing a Q* input word, then w 
consists of reduced words from the parabolic subgroups alternating with i?-local 
geodesies which do not contain any parabolic letters. These iJ-local geodesies may 
be empty, but by assuming that the parabolic subwords are maximal, we may 
assume that no two adjacent parabolic subwords lie in the same parabolic subgroup. 
Note that if two or more Pi meet in a non-trivial finite subgroup, any ambiguity 
where one parabolic subgroup ends and another begins can only consist of a single 
letter. 

We will choose to decompose w in a slightly different manner. We choose a 
parameter F < D. We decompose w as 

W = goPl, ■ . . ,gm~lPm9m 

where the pj are the maximal parabolic subwords which represent group elements 
of length greater than F . Since all other maximal parabolic subwords represent 
group elements of length less than or equal to D, each is an ^^-geodesic for some i. 
It follows that the gj are i5-local geodesies. Again, some of the gj may be empty, 
but not if they lie between Pi words. 

Lemma 5.13. For D, E sufficiently large, there is (A,e) = (Xd.e.Ft £d,e,f) such 
that each gi is a {\,e)- quasi- geodesic in H. While increasing F weakens the quasi- 
geodesity, increasing D and E does not. 

Proof. Let v be an i?-local geodesic of length E. This is a Cayley graph geodesic, 
and hence an X (A,0)-quasi geodesic, with A depending only on the embedding 
of r into X. By Proposition |5.7| u asynchronously fellow-travels its H geodesic 
7 together with any horospheres that 7 enters. Now 7 cannot stray far into any 
horosphere, for otherwise u would contain parabolic subwords of length greater 
than F. This bounds the ratio between the X length and the H length of 7. Notice 
that this bound is independent of E. Thus, by increasing E, we proportionally 
increase the X-length of u. That is to say, there is (A', e') is a Cayley graph 
geodesic containing no parabolic subword of length greater than F, then u is an H- 
(A', e')-quasigeodesic. 

It is a standard result for (5-hyperbolic spaces that given (A',e'), for E suffi- 
ciently large, there is (A, e) so that every _E-local (A', e')-quasigeodesic is a (A, e) 
quasigeodesic. Thus, choosing E (and hence, D) sufficiently large makes each g,j an 
H (A, e) quasigeodesic as required. □ 
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Condsider the decomposition of w into 

w = goPi, ■ ■ ■ ,gm~lPmgm 

as above. Ultimately, we must show that w is empty if and only if the input word 
which created it represents the identity. We will examine several paths related to 
w, namely 

<7 = (t(w, i^) = 7o7ri, . . . ,7m_i7r™7m 

TT = ■7r(w,F) = 507^1, . . . ,5m-l7r.m5m 

V = v{w,F) = goqi, . . .,gm-iqmgm 

where 

• Each 7i is the H-geodesic for the corresponding gi, 

• Each TTi is the H-geodesic for the corresponding pi , 

• Each Qi is a Cay ley graph geodesic for the corresponding pi. 

Lemma 5.14. Given D, E, F sufficiently large. 

• There is (A, e) such that a is an H- {X, e)-quasigeodesic. Increasing D and 
E does not worsen this quasigeodesity. 

• There is (A, e) such that tt is an H- (X, e)-quasigeodesic. 

• There is (A,e) such that v is a Cayley graph (X, e)-quasigeodesic. 

Proof. We first consider a. We choose F sufficiently large. Since each tt^ is long, by 
property [TO] of Proposition |5.6[ it spends only a limited time in a neighborhood of 
the exterior of its horoball. On the other hand, each 7^ can only spend a bounded 
time in the neighborhood of the horoballs it starts and ends at, for otherwise, by 
Lemma |5.11| it would start or end in the corresponding parabolic letters, contra- 
dicting the maximality of (at its beginning) or pi (at its end). Thus, the only 
way, a can fail to satisfy the assumptions of Proposition |5 . 9| is if one or more of the 
7i is short, i.e., of X length less than 5' + I. In this case, we modify a to produce 
a' by deleting each short 7^ and replacing tt; with tt^ starting at the beginning of 



7i. Clearly a and cr' asynchronously fellow travel. By Proposition 5.9 cr' is an H 
quasigeodesic, and thus, so is cr. 

It now follows by Lemma [5. 10| that tt is an H quasigeodesic. 

Finally, it follows from Lemma |5.8| that is an X quasigeodesic and hence a 
Cayley graph qusigeodesic. □ 

In the case where each Pi — (Ai) has the falsification by fellow traveler property, 
this gives Lemma 4.7 of [18 . It then follows that the language of geodesies in 
G = (G) is a regular language and that the growth of G = (G) is rational. This 
includes the limit groups of [21] since, as [8] has shown, these are hyperbolic relative 
to abelian subgroups. 



Proof. (Theorem 5.11 We suppose that w is the result of P-reducing an input word 
in V € G* ■ We must show that w is empty if and only if v represents the identity. 
Since w remembers its group element, the "only if" part is clear. 

Suppose now that v represents the identity. Then a(w) is an H- quasigeodesic. 
Since it represents the identity, this bounds its length. This, in turn bounds the 
length of w. Recall that increasing D and E does not worsen the quasi-geodesity of 
cr, and thus does not degrade the bound on the length of w. We may then assume 
that D and E are greater than this bound. Thus, w is a geodesic, in particular, a 
geodesic for the identity, and thus empty as required. □ 
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Proof. (Corollary 5.4 1 Let G — 7ri(Af ) where M is a finite volume negatively curved 
manifold with curvature bounded below and bounded away from 0. By [12 , G is 
hyperbolic relative to nilpotent subgroups Pi, . . . , Pk- Now each Pi has a Cannon's 



algorithm by Theorem 4.8 However, there is no guarantee that this is A'^-tight for 
Pi. It is, however, iV-tight for matrix group C/„(Z). Given any finite generating set 
V for Pi, we may include these into a generating set for Un{'L). Now, if iV > 1, any 
A^-tight Cannon's algorithm for J7„(Z) is 1-tight. It follows that for each p G V, 
p is the unique reduced word for itself. In particular, this is a 1-tight Cannon's 
algorithm for Pi. 



Since Lemma 5.11 holds for any sufficiently large K, we can assume that Ai 
contains any finite subset of Pi we select. 

Now consider the paths of Lemma |5.14[ The decompositions depend on a pa- 
rameter, F, and this parameter is stated in terms of Cayley graph length. However, 
it is only used to ensure that each tTj is long, i.e., that the H geodesic of this group 
element is long. By choice of K and hence, Ai, we can force this to be the case for 
any parabolic group element whose reduced word is at least two letters long. The 
proof now proceeds as before. □ 

6. Histories, Compression, Splitting and Splicing 

This and the following section are devoted to showing that certain groups do not 
have Cannon's algorithms. In this section we develop tools that apply to any de- 
terministic length-reducing rewriting system. Thus we will be able to show that a 
particular group G has neither a Cannon's algorithm, nor a non-incremental Can- 
non's algorithm. We believe that these results also hold for non-deterministic Can- 
non's algorithms. These latter are related to growing context sensitive languages. 
Extension of our methods to this case is work in progress. 

Let wq, . . . , Wn be the sequence of words produced as a rewriting algorithm makes 
n substitutions on wq. We call this sequence the history to time n ofwQ. We can 
draw a diagram of the history as follows. Draw wg as a row of £{wo) adjacent unit 
squares, labelled with the letters of wq. For each z > we draw Wi below Wi-i 
as follows. Draw a line segment under the first left-hand side appearing in Wi-i. 
(We call this a substitution line.) Underneath it put a row of equal width, height 
1 rectangles, labelled with the corresponding right-hand side, or if the right-hand 
side is empty, put a single black rectangle. Fill the remainder of the row with a 
copy of whatever appears in that part of 

The width of a letter ofwi is the width of its rectangle in the diagram. The width 
of a subword of Wi, not to be confused with its length, is the sum of the widths of 
the letters making it up (i.e. disregarding any black rectangles). 

In order to get a handle on how the number of letters in a word decreases as the 
algorithm runs, we consider how the widths of letters increase. 

Lemma 6.1. Let W be the length of the longest left-hand side of the rewriting 
system. In a diagram, the letters of any right-hand side (under a substitution line) 
have width at least W/{W~l) times that of the narrowest letter in the corresponding 
left-hand side (above the substitution line). 

Proof. If we were to first make all the letters of the left-hand side equal in width, 
deleting any black rectangles which appear, we would certainly not make the nar- 
rowest letter any narrower. Then at least one letter, of at most W, is removed, 
giving a further expansion of at least the stated factor. □ 
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Figure 1 . A diagram for one possible history of a rewriting sys- 
tem. The numbers indicate generations. Dotted hnes indicate a 
sphtting path. 



Next we define the generation of each letter in a diagram. The generation of each 
letter in the first row is 0. The generation of a letter in row i > is the generation 
of the letter above it, if it is not in a right-hand side, or one more than the least 
generation of the letters above the substitution line, if it is in a right-hand side. 



Lemma 6.2. 



// the generation of a letter is n then its width is at least ^ 



W 
w-i 



Proof. True for row 0. Suppose it is true for row z — 1. Since each letter in row i 
not in a right-hand side has the same generation and width as the corresponding 



letter in the row above, the assertion holds for these letters. By Lemma 6.1 any 
letter in a right-hand side is at least W/{W — 1) times the width of the narrowest 
letter in the corresponding left-hand side. But the generation of each right-hand 
side letter exceeds the generation of the narrowest left-hand side letter by at most 
one. Since the assertion is assumed to hold for the narrowest letter in the left-hand 
side., it holds for the letters of the right-hand side. □ 

Definition 6.3. 5*66 Figure [7] A splitting path of length n in a diagram for 
wq . . .Wt consists of n vertical line segments running between letters, from the top 
of the diagram to the bottom, such that successive segments either join end to end, 
or are linked by a substitution line. Segments may not cut substitution lines. For 
each segment substitution lines between the top and bottom of the segment, all lie 
to the same side of the segment. 

Lemma 6.4. // wt contains a letter of generation g, then the diagram contains a 
splitting path of length at most 2g + 2 ending next to the letter. We may choose the 
path to end on either side of it. 

Proof. Start at the bottom of the diagram with a vertical segment next to the letter 
of generation g. Extend upward until we come to a substitution line. Above that 
line will be a letter of generation <? — 1. Start a new segment next to that letter 
and continue on up. After hitting at most g substitution lines we reach the top of 
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the diagram. (If we hit an endpoint of a substitution hne we start a new segment 
only if the letter we are following is under the line.) 

This is not yet a splitting path: our vertical segments could still have substitution 
lines on both sides. When this happens it can only be with substitutions to the 
left in the upper part of the segment and to the right in the lower. (A sequence of 
substitutions going right to left would have to cross the vertical segment because 
such substitutions always overlap.) We split each such segment at the appropriate 
point and we are done. □ 

Associated with each splitting path are its details: For each vertical segment we 
record whether any substitutions take place to the left or the right. (If neither, we 
can arbitrarily designate it as left.) For a left segment we record the first W — 1 
letters to its right (which will be constant), or to the end of the word if nearer. For 
a right segment we record the W — 1 letters to the left, or to the start of the word 
if nearer. If a segment ends on a substitution line we record the left-hand side, the 
position at which the path splits it (in the range — W) and the position at which 
the next segment splits the right-hand side (in the range — {W — 1)). 

We say that two splitting paths (in different diagrams) are equivalent if they 
have the same details. (Note: we do not require vertical segments to be the same 
height.) 

For example, the details of the splitting path shown in Figure [T] might be given 
as: (left, "aaa"), (right, "aaa", "aaab", 3, 2), (left, "aaa"), (right, "baa", "aaab", 
2, 3), (left, "aa"). 

Remark 6.5. There are no more than {2{W + l)^|Ap^+^)"+^ equivalence classes 
of splitting path of length less than or equal to n. 

Given a splitting path for wq, ■ ■ ■ jWt, we define and to be the subwords 
of Wi, to the left and the right respectively of the path. The next lemma can be 
interpreted as telling us that the detail of the splitting path is like a message that is 
passed between and w'^: if sends the same message as , Wq won't notice 
the change. 

Lemma 6.6. Let Vq, . . . ,Vr and wq, . . . ,Ws contain equivalent splitting paths. Then 
the history of VqWq, up to a suitable time, contains an equivalent splitting path, 
and ends with the word v^wf . 

Proof. Cut the histories of vq and wq along their respective splitting paths. Fit the 
left half of vq's history with the right half of wq's history. The lengths of vertical 
segments are most likely unequal: one side or the other is constant so we just make 
as many copies of the constant side as required to fit the two together. 

We claim that in the resulting sequence of words, each word differs from the next 
by replacing a left-hand side with its corresponding right-hand side. This is clear 
when both words lie on the same segment or on successive segments joined end-to- 
end. In the remaining case, both path details record the same left-hand side, split 
at the same point, and identical splitting points in the corresponding right-hand 
side. 

We still have to show that the left-hand sides at which changes occur are those 
that would be chosen by the algorithm. Consider words joined at a left segment. 
The left-hand side begins in v~ (and ends in it as well, unless it is one of the left- 
hand sides on the path). Therefore, from the start of the left-hand side, the next 
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W letters are the same whether v~ is completed by or (since these begin 
with the same — 1 letters) . The algorithm will therefore substitute at the same 
place in either word. Now consider words joined at a right segment. The left-hand 
side ends somewhere in Wj'. Any left-hand side in v~w~^ , starting to the left of 
this one, would have to start within W — I letters of the end of , for otherwise it 
would be a left-hand side in Vi (wholly to the left of a right segment). But in view 
of this, the same left-hand side would appear in Wj (since w~ and v~ end with the 
same W — 1 letters) . 

It follows that we have constructed the history of VqWq. That it contains a copy 
of the same splitting path, and ends with v~w'^, is clear. □ 

We would like to be able to say that v~ is determined by Vq and the splitting 
path but unfortunately this is not quite true. Let [v~] denote the first word to the 
left of the last segment in the splitting path. What the proof of Lemma |6.6| shows 
is that is determined by Vq and the splitting path. If the last segment is a 
right segment, then v~ = [v~], but if not the best we can say is that v~ is obtained 
from [v~] by substitutions entirely inside the latter. Similar statements hold for 

6.1. Subwords and border letters. We extend the results of this section to 
subwords. If vq, . . . ,Vt is a history of vq, and wq is a subword of vq, how shall 
we define the history of Wq7 We can do it by fixing a deletion convention for the 
rewriting system: for each left-hand side, decide which letters are deleted and which 
are changed to get the corresponding right-hand side. It is then determined, when 
a substitution takes place over the boundary of Wi <Z Vi, which letters of the right- 
hand side belong to Wi+i and which do not. More generally we can consider uq to 
be split up into arbitrarily many subwords; a deletion convention will determine 
how each Vi is to be split up. 

We want to define the diagram of wq 's history in such a way that each subword 
gets its own "sub-diagram". In other words, we want the history of wq C vq to 
occupy a rectangular block underneath wq. Therefore, when a substitution takes 
place over a subword boundary, we adjust the widths of the right-hand side letters 
on either side of the new boundary to keep it vertically aligned under the previous 
boundary. The problem that arises is that a deletion may occur on one side only 



of the subword boundary: in that case Lemma 6.1 fails. We make the following 
adjustments. 

We designate the W — 1 letters to either side of a subword boundary as border 
letters (see Figure [2]). When a right-hand side contains both border and non-border 
letters, we assign widths as follows. The number of border letters will be the same 
in the right-hand side as in the left-hand side, so we line them up under the border 
letters of the left-hand side and keep their widths the same; we expand the non- 
border letters to fill the remaining space evenly. Otherwise we assign widths as 
previously stated. Note that when a left-hand side contains a subword boundary, 
the right-hand side will consist only of border letters. Now Lemma [O] holds for all 
non-border letters. 

We restrict the definition of the generation of a letter to non-border letters: the 
generation of a non-border letter in a right-hand side is one more than the least 
generation of the non-border letters of the corresponding left-hand side. With this 
adjustment. Lemma [O] goes through. 
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Figure 2. A diagram for a history with subwords. Sohd vertical 
hnes indicate subword boundaries. Cross-hatching indicates border 
letters. 



Splitting paths are defined as before except that we forbid any of the words 
included in the detail to cross a subword boundary. Lemma |6 . 4| gives us such a path 
since we follow the edges of non-border letters (letters for which the generation is 
defined). Clearly a splitting path for wq C vq is also one for vq. 

We show that the number of splitting paths required to split all subword histories, 
with a given starting length, is bounded by a polynomial function of that starting 
length. This bound is independent of the total number of substitutions in the 
history. It follows that if we have enough histories, two of them will have equivalent 
splitting paths. 

Lemma 6.7. Let wq be a subword of vq of length £{wq) = N , and let wg, . . . ,Wt be 
such that £{wt) > 2W — 1. Then Wt has a splitting path in one of at most CiN'~^^ 
equivalence classes, where Ci,C2 are positive constants depending only on |A| and 
W. The splitting path can be chosen to end next to any non-border letter ofwt- 

Proof. Since £{wt) > 2W — 1 it contains a t lea st one non-border letter. Choose 
one, and let g be its generation. By Lemma 6.2 its width is at least ^ ^^-^ ^ . But 
since this cannot exceed N = £{wq), the width of wq, we have g < log^ w N. 



By Lemma 6.4 we can find a splitting path, ending next to our chosen letter, of 
length at most 2g + 2. By Remark |6.5| the number of classes of splitting path, of 
length < 2g + 2, does not exceed {2{W + lf\A\'^'^+^'fs+3_ ^^^^^ g jg bounded by 
a logarithm of N, the result follows. □ 

We consider now a word divided into two subwords uo,vo. We keep vq fixed, 
vary uq, and run the algorithm for some amount of time. Intuitively speaking, 
our algorithm carries information between the two subwords, giving in principle a 
number of possible values for vt which is exponential in £(vo)- We show that the 
number of distinct Vt that can actually arise is only polynomial in ^(wq). 

Lemma 6.8. Let vq be a fixed word of length N > I in A* . For each word uq we 
choose a time t and let u^vt be the result of applying t substitutions to UqVq; Vt is 
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then a function ofuQ. There exist positive constants Cq,C, depending only on |A| 
and W, such that Wt(wo) takes at most CqN'-^ distinct values as uq varies. The same 
bound applies if we instead define VtUt to be the result of applying t substitutions to 

VqUq. 



Proof. First consider all vt such that ({vt) > '2W — 1. Then by Lemma 6.7 each 
vq, . . . ,vt has a splitting path ending W — 1 letters from the start of vt, in one of 
at most CiN'-'^ classes. 

Since [v^] is determined by Vq and the class of the splitting path, and has at 
most N possible values (each being a subword at the end of vq), [v^] takes at most 
C^7V^2+i distinct values as Uq varies. Since is one of the, at most i{[vt]) < N, 
words obtained by making substitutions in [v^], vf itself can take at most CiN'-^^^'^ 
values. On the other hand, since i{v^) = W — \, this can take at most jAj'^"^ 
values. Multiplying these two gives the required bound on the number of values vt 
can take when > 2W — 1. 

The number of possible words of length less than 2W — 1 is constant and, since 
we are assuming iV > 1, we can absorb this into Cq. 

The proof for w^uj is similar. □ 

Remark 6.9. If the rewriting system were not required to delete a letter with every 
substitution, the number of values i't(uo) could take might well be exponential in 

7. Groups which have no Cannon's Algorithm 

In this section we use the results of the previous section to exhibit groups which 
have no Cannon's algorithm. 

Theorem 7.1. Let G be a group with some fixed generating set. Suppose that for 
each n > there are sets Si{n) C G and S2{n) C G satisfying the following: 

(1) For i — 1,2 each element of Si{n) can be represented by a word of exactly 
length n. 

(2) There are ao > and ai > I so that for infinitely many n 

\S,{n)\>aoa'l. 

(3) Each element of Si commutes with each element 0/6*2. 
Then G has no Cannon's algorithm. 

Proof. Suppose to the contrary that we have a deterministic Cannon's algorithm 
for G. Let A be the working alphabet and let W be the length of the longest left 
hand side. Choose n > such that 15*^(71)1 > a^a^, and 

^aoar > Cin^'+^\Af^Gon^, 



where C, Co, Ci and C2 are as in Lemmas 6.7 and 6.8 Let Ti be a set of words of 
length n representing Si{n), for i ~ 1,2. 

We shall consider the effect of our supposed Cannon's algorithm on words of the 
form uqVoUq^Vq^ , for uq £ Ti and vq G T2. All such words must reduce to the 
empty word since Si{n) and S2{n) commute. Put xq — and yo ~ Vq^, and let 
utVtXtyt denote the result of applying t substitutions to uoVoXqUq. 

Define t, as a function of Uq and Vq, to be the least integer t > such that 
ma.x{i{vt) , i{xt)} < 3W. I.e., we run the algorithm until the first time at which 
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both vq and xq have length less than SW. Then ut,Vt, xt and yt are all well defined 
functions of uq and vq. Since at most W letters are deleted in each step it follows 
that 2W < ma.x{£{vt)Jixt)} <3W-1. 

For each pair {uo,vo) G Ti x T2, one or both of the inequalities £{vt) > i{xt), 
£{xt) > i{vt) holds. Therefore one of these inequalities must hold for at least half 
of Ti X T2. We shall suppose it is the first, and argue to obtain a contradiction; 
were it the second, a similar argument, interchanging the roles of Vt and x^, would 
give a contradiction instead. 

Step 1, fix a Vq: Since we are assuming that, for at least half the pairs (mo,wo), 
£(vt) > i{xt), we can certainly find a, vq d T2 such that for at least half of uq G Ti, 
£{vt{uo,vo)) > £{xt{uo,vo)). We fix this vq and henceforth regard ut,vt,Xt,yt as 
functions of uq alone. Let U = {uq G Ti \ £{vt) > £{xt)}- By the choices we have 
made have made, \U\ > ^\Ti\ > iao"i- 

Step 2, split the Vt using boundedly many splitting classes: By the definitions of 



t and U, i{vt) > 2W for each uq e U. By Lemma 6.7 we can choose a splitting 
path for each vq, . . . ,vt, using at most Cin'^^ classes. If we take into account also 
the position in vq at which the path begins, and the position in vt at which the 
path ends, we get at most Ci'nP^^'^ classes. 

Step 3, the map uq i— > VtXtyt is many-to-one: By the definition of t, £{vt), i{xt) < 
3W — 1, so the number of possible values VtXt can take, as uq ranges over U, is less 



than |A|°'^. By Lemma 6.8 the number of values yt can take is at most Con' 



C 



for positive constants Co and C depending only on |A| and W . On the other hand 
\U\ > iaoa'/ so, by our choice of n, \U\ > Cin^^+^\A\'^^ Cqh^ . It follows that 
there exists a set of at least Cin'-^^+^ uo's in U which all give the same vtXtyt- 

Step 4-, construct a word that breaks the Cannon's algorithm: From Step 3, we 
have more than Cin*^^^^ uq's giving the same VtXtyt- From Step 2, we have at most 
Cin*^^"*"^ positioned splitting path classes for vt- Therefore we can find uq^u'q € U 
such that (writing v[ for ^^((mq) etc.), vtXtyt — v'tx'tD'ti and vq, . . . ,vt and Wq, . . . , 
contain equivalent splitting paths, starting at the same position in wq = Wq and 
ending at the same position in vt — v[. 



By Lemma 6.6 running the algorithm on uoVQv'^XQyQ yields utv^ v'f^ x[y[. Now 
uqVq v'f^ x^y^ = UQVoXQyo, which does not represent the identity in C but rather 
wqMq" . On the other hand, utv^v'f^x'^y^ = utVtXtyt which reduces to the empty 
word. Therefore this rewriting system does not implement a Cannon's algorithm 
for G. □ 

In fact, it is not hard to strengthen this to the following. 

Theorem 7.2. Let C be a group with some fixed generating set. Suppose that for 
each n > there are sets Si{n) C G and 6*2 (j^) C G satisfying the following: 

(1) For z = 1,2 each element of Si{n) can be represented by a word of exactly 
length n. 

(2) There are ao > and ai > 1 and a2 > so that for all n 

\Si{n)\>aoa'l 

and 

|52(n)| > Q!27^. 
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(3) Each element of Si commutes with each element 0/5*2- 
Then G has no Cannon's algorithm. 

Proof. The proof is very similar to that of Theorem |7 . 1 1 except that we have to start 
with uq longer than uq- As before, suppose that we have a deterministic Cannon's 
algorithm for G, with working alphabet A, and longest left-hand side of length W . 
Choose ni and n2 such that 

(1) iaoa^ > Cin?^+2| 

and 



(2) -a^n^ > Cinf^+2|A|«^Conf . 

Let Ti be a set of words of length Ui bijecting to Si{ni), for i = 1, 2. 

As before, let utVtXtyt denote the result of applying t substitutions to uqVqu^ Vq . 
Define t, as a function of uo,vo, such that 2W < ma,x{£{vt) , i{xt)} < 3W — 1. 

If, for at least half of (uq, wq) G Ti x T2, £{vt) > £{xt), we can argue as in Steps 1- 
4, using ([iJ, to find uq,u'q which break the algorithm. In the other case, arguing 
similarly, using ([2|, we can find vo,Vq which break the algorithm. □ 

Theorem 7.3. Suppose G has exponential growth and the center of G contains an 
infinite cyclic group. Then G has no Cannon s algorithm. 

Proof. We take among our generators for G a letter z denoting a central element of 
infinite order and a letter r denoting the identity. We can then Take 5*1(71.) = B{n), 
for if u is a geodesic denoting an element of length fc < n, we can denote this 
element by ur^~^ . Thus each element of B{n) is represented by a wo rd o f length 
n. Likewise, we can take 52 = {z'' \ \k\ < n}. We then apply Theorem^ □ 



Corollary 7.4. F2 x 7^ has no Cannon's algorithm. □ 

Corollary 7.5. If G is a braid group of 3 or more strands. G has no Cannon's 
algorithm. □ 

Corollary 7.6. If M is a graph manifold one of whose pieces is a non-closed Seifert 
fibered space, then 7ri(Ajf) has no Cannon's algorithm. □ 

Corollary 7.7. If M is a closed 3-manifold modelled on either M"^ xM or PSL2{M.), 
then TTi{M) has no Cannon's algorithm. □ 

Corollary 7.8. Thompson's group F has no Cannon's algorithm. 

Proof. Thompson's group F has exponential growth [7] and contains a subgroup 
isomorphic to the direct product of two copies of itself. □ 

We will say that a subgroup A of G has exponential growth in G if AO B{n) has 
exponential growth. 

Theorem 7.9. Suppose G has an abelian subgroup which has exponential growth 
in G. Then G has no Cannon's algorithm. 



Proof. In this case we take Si{n) = S2{n) —An B{n) and apply Theorem 7.1 □ 
Corollary 7.10. If G is a Baumslag-Solitar group 

{a,t I taPt'^ = a«) 
with p ^ ±q then G has no Cannon's algorithm. 
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Proof. It is not hard to see that £{a") — O(hin) and consequently, (a) has expo- 
nential growth in G. □ 

Corollary 7.11. Suppose M is a closed 3-manifold modelled on solvegeometry. 
Then ■Ki{M) has no Cannon's algorithm. 

Proof. In this case 7ri(Af) contains a finite index subgroup of the form A x Z where 
A is isomorphic to 1? and the action of the generator of Z has eigenvalues A and 
A^^ with the modulus of A greater than 1. It follows that A has exponential growth 
in 7ri(M). □ 

Combining these with our results on virtually nilpotent and geometrically finite 
groups we have: 

Theorem 7.12. Suppose M is a graph manifold. Then ■ni{M) has a Cannon's 
algorithm if and only if none of the following hold: 

(1) M is closed x M, PSL2{^) or solvegeometry manifold, or 

(2) M has a non-closed Seifert fibered piece. 
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